Overview

Dataset Statistics

Number of Variables 10
Number of Rows 32951
Missing Cells 2448
Missing Cells (%) 0.7%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 7.0 MB
Average Row Size in Memory 224.1 B
Variable Types
  • Numerical: 8
  • Categorical: 2

Dataset Insights

index is uniformly distributed Uniform
product_category_name has 610 (1.85%) missing values Missing
product_name_lenght has 610 (1.85%) missing values Missing
product_description_lenght has 610 (1.85%) missing values Missing
product_photos_qty has 610 (1.85%) missing values Missing
index is skewed Skewed
product_name_lenght is skewed Skewed
product_photos_qty is skewed Skewed
product_weight_g is skewed Skewed
product_length_cm is skewed Skewed
product_width_cm is skewed Skewed
product_id has a high cardinality: 32951 distinct values High Cardinality
product_category_name has a high cardinality: 73 distinct values High Cardinality
product_id has constant length 32 Constant Length
product_id has all distinct values Unique
  • 1
  • 2

Variables


index

numerical

Approximate Distinct Count 32951
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 527216
Mean 16475
Minimum 0
Maximum 32950
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • index is uniformly distributed

Quantile Statistics

Minimum 0
5-th Percentile 1647.5
Q1 8237.5
Median 16475
Q3 24712.5
95-th Percentile 31302.5
Maximum 32950
Range 32950
IQR 16475

Descriptive Statistics

Mean 16475
Standard Deviation 9512.2787
Variance 9.0483e+07
Sum 5.4287e+08
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.5774
  • index is not normally distributed (p-value 4.22651406964379e-25)

product_id

categorical

Approximate Distinct Count 32951
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 3196247

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 1e9e8ef04dbcff4541...
2nd row 3aa071139cb16b67ca...
3rd row 96bd76ec8810374ed1...
4th row cef67bcfe19066a932...
5th row 9dc1a7de274444849c...

Letter

Count 395107
Lowercase Letter 395107
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 659325
  • product_id contains many words: 32951 words
  • product_id has words of constant length

product_category_name

categorical

Approximate Distinct Count 73
Approximate Unique (%) 0.2%
Missing 610
Missing (%) 1.9%
Memory Size 2585943

Length

Mean 14.9587
Standard Deviation 5.8236
Median 15
Minimum 3
Maximum 46

Sample

1st row perfumaria
2nd row artes
3rd row esporte_lazer
4th row bebes
5th row utilidades_domesti...

Letter

Count 452813
Lowercase Letter 452813
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 95

product_name_lenght

numerical

Approximate Distinct Count 66
Approximate Unique (%) 0.2%
Missing 610
Missing (%) 1.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 517456
Mean 48.4769
Minimum 5
Maximum 76
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • product_name_lenght is skewed left (γ1 = -0.9032)

Quantile Statistics

Minimum 5
5-th Percentile 29
Q1 42
Median 51
Q3 57
95-th Percentile 60
Maximum 76
Range 71
IQR 15

Descriptive Statistics

Mean 48.4769
Standard Deviation 10.2457
Variance 104.9752
Sum 1.5678e+06
Skewness -0.9032
Kurtosis 0.1923
Coefficient of Variation 0.2114
  • product_name_lenght is not normally distributed (p-value 4.856933679031091e-08)
  • product_name_lenght has 290 outliers

product_description_lenght

numerical

Approximate Distinct Count 2960
Approximate Unique (%) 9.2%
Missing 610
Missing (%) 1.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 517456
Mean 771.4953
Minimum 4
Maximum 3992
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • product_description_lenght is skewed right (γ1 = 1.962)

Quantile Statistics

Minimum 4
5-th Percentile 150
Q1 339
Median 595
Q3 972
95-th Percentile 2063
Maximum 3992
Range 3988
IQR 633

Descriptive Statistics

Mean 771.4953
Standard Deviation 635.1152
Variance 403371.3486
Sum 2.4951e+07
Skewness 1.962
Kurtosis 4.828
Coefficient of Variation 0.8232
  • product_description_lenght is not normally distributed (p-value 0.0004921976016495151)
  • product_description_lenght has 2078 outliers

product_photos_qty

numerical

Approximate Distinct Count 19
Approximate Unique (%) 0.1%
Missing 610
Missing (%) 1.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 517456
Mean 2.189
Minimum 1
Maximum 20
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • product_photos_qty is skewed right (γ1 = 2.1933)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 1
Median 1
Q3 3
95-th Percentile 6
Maximum 20
Range 19
IQR 2

Descriptive Statistics

Mean 2.189
Standard Deviation 1.7368
Variance 3.0164
Sum 70794
Skewness 2.1933
Kurtosis 7.2622
Coefficient of Variation 0.7934
  • product_photos_qty is not normally distributed (p-value 4.4636510381380114e-21)
  • product_photos_qty has 849 outliers

product_weight_g

numerical

Approximate Distinct Count 2204
Approximate Unique (%) 6.7%
Missing 2
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 527184
Mean 2276.4725
Minimum 0
Maximum 40425
Zeros 4
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • product_weight_g is skewed right (γ1 = 3.6047)

Quantile Statistics

Minimum 0
5-th Percentile 105
Q1 300
Median 700
Q3 1900
95-th Percentile 10850
Maximum 40425
Range 40425
IQR 1600

Descriptive Statistics

Mean 2276.4725
Standard Deviation 4282.0387
Variance 1.8336e+07
Sum 7.5007e+07
Skewness 3.6047
Kurtosis 15.1311
Coefficient of Variation 1.881
  • product_weight_g is not normally distributed (p-value 9.586001466631602e-23)
  • product_weight_g has 4551 outliers

product_length_cm

numerical

Approximate Distinct Count 99
Approximate Unique (%) 0.3%
Missing 2
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 527184
Mean 30.8151
Minimum 7
Maximum 105
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • product_length_cm is skewed right (γ1 = 1.7504)

Quantile Statistics

Minimum 7
5-th Percentile 16
Q1 18
Median 25
Q3 38
95-th Percentile 65
Maximum 105
Range 98
IQR 20

Descriptive Statistics

Mean 30.8151
Standard Deviation 16.9145
Variance 286.0989
Sum 1.0153e+06
Skewness 1.7504
Kurtosis 3.5129
Coefficient of Variation 0.5489
  • product_length_cm is not normally distributed (p-value 1.3783347546141773e-11)
  • product_length_cm has 1380 outliers

product_height_cm

numerical

Approximate Distinct Count 102
Approximate Unique (%) 0.3%
Missing 2
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 527184
Mean 16.9377
Minimum 2
Maximum 105
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • product_height_cm is skewed right (γ1 = 2.14)

Quantile Statistics

Minimum 2
5-th Percentile 3
Q1 8
Median 13
Q3 21
95-th Percentile 44
Maximum 105
Range 103
IQR 13

Descriptive Statistics

Mean 16.9377
Standard Deviation 13.6376
Variance 185.9829
Sum 558079
Skewness 2.14
Kurtosis 6.6774
Coefficient of Variation 0.8052
  • product_height_cm is not normally distributed (p-value 1.3938173325209465e-05)
  • product_height_cm has 1892 outliers

product_width_cm

numerical

Approximate Distinct Count 95
Approximate Unique (%) 0.3%
Missing 2
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 527184
Mean 23.1967
Minimum 6
Maximum 118
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • product_width_cm is skewed right (γ1 = 1.6709)

Quantile Statistics

Minimum 6
5-th Percentile 11
Q1 15
Median 20
Q3 30
95-th Percentile 47
Maximum 118
Range 112
IQR 15

Descriptive Statistics

Mean 23.1967
Standard Deviation 12.079
Variance 145.9034
Sum 764309
Skewness 1.6709
Kurtosis 4.0723
Coefficient of Variation 0.5207
  • product_width_cm is not normally distributed (p-value 1.703233438190457e-11)
  • product_width_cm has 912 outliers

Interactions

Correlations

Missing Values